OcrV1, Main, Exploration, bibRecord, 001A14

Technology of Text Mining

Identifieur interne : 001A14 ( Main/Exploration ); précédent : 001A13; suivant : 001A15

Technology of Text Mining

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2001.

RBID : ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B

Abstract

Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.

Url:

https://api.istex.fr/document/A9D55CDEED0425A739C61C52479F43C882308A8B/fulltext/pdf

DOI: 10.1007/3-540-44596-X_1

Affiliations:

Finlande

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000317
to stream Istex, to step Curation: 000312
to stream Istex, to step Checkpoint: 001068
to stream Main, to step Merge: 001B07
to stream Main, to step Curation: 001A14

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct:series"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Technology of Text Mining</title>
<author><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<date when="2001" year="2001">2001</date>
<idno type="doi">10.1007/3-540-44596-X_1</idno>
<idno type="url">https://api.istex.fr/document/A9D55CDEED0425A739C61C52479F43C882308A8B/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000317</idno>
<idno type="wicri:Area/Istex/Curation">000312</idno>
<idno type="wicri:Area/Istex/Checkpoint">001068</idno>
<idno type="wicri:doubleKey">0302-9743:2001:Visa A:technology:of:text</idno>
<idno type="wicri:Area/Main/Merge">001B07</idno>
<idno type="wicri:Area/Main/Curation">001A14</idno>
<idno type="wicri:Area/Main/Exploration">001A14</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Technology of Text Mining</title>
<author><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
<affiliation wicri:level="1"><country xml:lang="fr">Finlande</country>
<wicri:regionArea>Tampere University of Technology, FIN-33101, P.O. Box 553, Tampere</wicri:regionArea>
<wicri:noRegion>Tampere</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Finlande</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2001</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">A9D55CDEED0425A739C61C52479F43C882308A8B</idno>
<idno type="DOI">10.1007/3-540-44596-X_1</idno>
<idno type="ChapterID">1</idno>
<idno type="ChapterID">Chap1</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: A large amount of information is stored in databases, in intranets or in Internet. This information is organised in documents or in text documents. The difference depends on the fact if pictures, tables, figures, and formulas are included or not. The common problem is to find the desired piece of information, a trend, or an undiscovered pattern from these sources. The problem is not a new one. Traditionally the problem has been considered under the title of information seeking, this means the science how to find a book in the library. Traditionally the problem has been solved either by classifying and accessing documents by Dewey Decimal Classification system or by giving a number of characteristic keywords. The problem is that nowadays there are lots of unclassified documents in company databases and in intranet or in Internet. First one defines some terms. Text filtering means an information seeking process in which documents are selected from a dynamic text stream. Text mining is a process of analysing text to extract information from it for particular purposes. Text categorisation means the process of clustering similar documents from a large document set. All these terms have a certain degree of overlapping. Text mining, also know as document information mining, text data mining, or knowledge discovery in textual databases is an merging technology for analysing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge. Typical subproblems that have been solved are language identification, feature selection/extraction, clustering, natural language processing, summarisation, categorisation, search, indexing, and visualisation. These subproblems are discussed in detail and the most common approaches are given. Finally some examples of current uses of text mining are given and some potential application areas are mentioned.</div>
</front>
</TEI>
<affiliations><list><country><li>Finlande</li>
</country>
</list>
<tree><country name="Finlande"><noRegion><name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</noRegion>
<name sortKey="Visa, Ari" sort="Visa, Ari" uniqKey="Visa A" first="Ari" last="Visa">Ari Visa</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001A14 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001A14 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:A9D55CDEED0425A739C61C52479F43C882308A8B
   |texte=   Technology of Text Mining
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Technology of Text Mining

Technology of Text Mining

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri